Subject:
Re: ODM Web Services: VariableID and Codes
From:
David Valentine <valentin@sdsc.edu>
Date:
Tue, 27 Feb 2007 10:36:42 -0800
To:
Jeff Horsburgh <jeffh@cc.usu.edu>
CC:
'Catharine van Ingen' <vaningen@windows.microsoft.com>, zaslavsk@sdsc.edu, jon.goodall@duke.edu, 'Kim Schreuders' <kimas@cc.usu.edu>, twhit@mail.utexas.edu, David Tarboton <dtarb@cc.usu.edu>

I'm pulling this in a little tighter than an everyone.

Two parts:
1) status, and how to handle the request to retrieve by variableID
2) My thoughts on how the coupling of Network/Vocabulary and NetworkCode/VocabularyCode should be handled.
-----

We already have this problem where one variableCode can return multiple rows, even for the same Network/Vocabulary. The EPA is only useful with sample medium. NWIS has been split into multiple services.

Pretty much the situation is not well handled, right now. We, well I, need to code to look for attributes that would be added to the variable request, and attempt to get at the correct variable. And we might need to extend the schema and OD to provide an official way to reference the variable.
EPA:XXXXX\SampleMedium=YYYY\Method=ZZZZ

We could also add a keyword approach, where a certain vocabulary term mean "return this variableID" This call will be valid only to a this specific OD, or web service. For example, we could say, a variable submitted to a web service with an vocabulary of "VARID" would return the variableInfo with that internal variableID.

VARID:XXXXXXXX

This is what we now do for geometries in the location parameter. A network of "GEOM" says that this is a geometry, and not a siteID.


----

At the same time, this is a question of where we put the bar. Do we rely on smarter users, or smarter back end programming (or both).

I agree that we need tracking of Networks and Vocabulary, but I disagree on how is proposed to be implemented.
Concatenating two items into a single field and stroring them in a single column is not ideal. My problem with this approach is that the combined data does not exist as columns, separately. And as noted above, in order to differentiate, we may not only need to include the Vocabulary and VariableCode, but also other attributes as well. I think that we should add a column called "VariableCodeExternal" "VariableCodeExternal" would store the reference that would allow for the code to be matched.

I agree on the tracking of Networks and Vocabulary for the MyDB format. You want the attributes tightly bonded, since it is basically a flat file. We want the uniquely identified VaribiableCode that are in VariableCodeExternal

For the OD, I would recommend a more flexible representation. This be accomplished the this by adding a columns to the sites and variables table.

Sites add:
sourceID,NetworkCode,SiteCodeExternal
Vocabulary add:
sourceID,VocabularyCode,VariableCodeExternal

It has the same effect, since a Vocabulary and VariableCode are in the same row. But we have added provenance to the code with the sourceID, and put some responsibility on the programmer is responsible for doing the splitting and combining. And we have not lost information by combining multiple fields into a single column, we only generate a unique reference, XxxxCodeExternal. The unique reference is really a convince, since the programmer should split the input into component parts, and query a table in the database for the appropriate value.



Jeff Horsburgh wrote:
>
> David V. and Ilya,
>
> I downloaded the ODM web services and got them up and running on a development machine. It took me a while to figure out that I had to add permissions for the ASP.Net account to my SQL Server database, so you may want to add that to the instructions. I was happy that it didnt take me too long to get things running! However, I have come up with a couple of issues because I can only currently get one of the methods to work on my testing database.
>
> When I did my ODM Tools demo on the HIS conference call, David Maidment requested that for VariableCodes and SiteCodes I put in the ODM database a prefix that indicates where those codes come from. An example of how I went about this is the following for water temperature from the USGS NWIS system:
>
> SiteCode: NWIS:10109000
>
> SiteName: Logan River Near Logan, UT
>
> VaraibleCode: NWIS:00010
>
> VariableName: Temperature, water
>
> The above is what is in my testing database. Now, to the first issue - In the ODM Web services you tack on an additional network code to the SiteCode and VariableCode and require it in the parameter for the method calls. For example, to use GetSiteInfo for the above I would have to pass it: ODM:NWIS:10109000 as the site code where the ODM part comes from the setting in the web.config file. This seems to work for the GetSiteInfo method, but I cant get any of the other methods to work unless I remove the NWIS part of the string. For example, if I change my database so that the VariableCode for temperature is 00010 instead of NWIS:00010, the GetVariableInfo method will work.
>
> Up to this point it has probably been OK to assume that the data within an ODM instance comes from a single network (i.e., NWIS, STORET, etc.). I assume that is how you have the other web service catalogs set up (i.e., a separate ODM database for each), and I also assume that is why you have the network information in the web.config file of the web services and not based on information in the database. However, these are just assumptions since I dont really know what is going on under the hood. For the test beds and for observatories, though, it is very likely that they will have multiple networks contributing data to the same instance of ODM. Hence the network information needs to be with the SiteCodes and VariableCodes in the database and not on top of the web services. Based on David Maidments request that we identify where the VariableCodes come from, if we have USGS data and data collected by USU in a single ODM database, and both are collecting water temperature data, we might have something like:
>
> VariableCode: NWIS:00010  for water temperature data collected by USGS
>
> VariableCode: USU:10  for water temperature data collected by USU
>
> The above also brings me to the second issue. The VariableCodes in the ODM are not necessarily unique. For example I can get NWIS:00060 from the USGS NWIS daily values, I can get NWIS:00060 from the NWIS realtime data, and I can get NWIS:00060 from the instantaneous irregular data. If the test bed people choose to put all of this in the same ODM database, simply passing a single VariableCode (i.e, NWIS:00060) to the web service would lead to ambiguous results  which one do you return? In each of these cases, the Variables represented by a VariableCode of NWIS:00060 are not uniquely distinguished by their VariableCode  but they are uniquely qualified by a combination of attributes in the Variables table, such as their TimeSupport, their DataType, their ValueType, etc. ODM does not use the USGS convention of collapsing all variable attributes onto one VariableCode. We do have an essentially equivalent concept in the VariableID field, but the VaraibleIDs are somewhat arbitrary since they are just unique integers and could be different from ODM to ODM.
>
> Many of the test beds may choose not to put USGS data in their ODM database because you have the NWIS web services, but an equally likely scenario is the following. A test bed PI at USU is collecting continuous water temperature data. He puts the raw (Level 0) data into ODM with a VariableCode of USU:10 which is a hypothetical variable code that he assigns to the raw water temperature data. He then uses ODM Tools to create a Level 1 quality controlled data series from the Level 0 data. This new data series is added to the same ODM instance, and yes, he assigns it the same VariableCode (USU:10). The only thing that has changed is the QualityControlLevel of the data. He then creates a daily average temperature data series from the Level 1 data. Again, he assigns it a VariableCode of USU:10 because the variable didnt change, but the DataType, QualityControlLevel, the Method, and the TimeSupport changed. You get the idea
>
> It seems that we have a couple of options: 1) we make the web services accept additional parameters in the GetVariableInfo and GetValues calls such that an ODM data series can be uniquely identified, or 2) we make more robust data discovery methods that allow us to return a VariableID associated with the set of attributes that make a Variable unique (i.e., VariableCode, VariableUnits, SampleMedium, ValueType, IsRegular, TimeSupport, DataType, GeneralCategory) and then use the VariableID in the GetVariableInfo and GetValues call.
>
> Sorry for the long email. What do you guys think?
>
> Jeff Horsburgh
>
> Environmental Management Research Group
>
> Utah Water Research Laboratory
>
> 8200 Old Main Hill
>
> Logan, UT 84322-8200
>
> Phone: 435-797-2946
>
> Fax: 435-797-3663
>
> jeff.horsburgh@usu.edu <mailto:jeff.horsburgh@usu.edu>
>


-- 
David Valentine             GIS Programmer                Room 469                          San Diego Supercomputer Center    Univ of Calif, San Diego MC 0505 La Jolla, CA 92093-0505

phone: 858-822-0923
email: valentin@sdsc.edu




